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Abstract 

The FermaT transformation system, based on re- 
search carried out over the last sixteen years at 
Durham University, De Montfort University and Soft- 
ware Migrations Ltd., is an industrial-strength formal 
transformation engine with many applications in pro- 
gram comprehension and language migration. This 
paper is a case study which uses automated plus 
manually-directed transformations and abstractions 
to convert an IBM 370 Assembler code program into 
a very high-level abstract specification. 
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2 Introduction 

There is a vast collection of operational software 
systems which are vitally important to their users, 
yet are becoming increasingly difficult to maintain, 
enhance and keep up to date with rapidly changing re- 
quirements. For many of these so called legacy systems 
the option of throwing the system away and re- writing 
it from scratch is not economically viable. Methods 
are therefore urgently required which enable these sys- 
tems to evolve in a controlled manner. In particular, 
legacy assembler systems have high maintenance costs, 
and migrating such systems to a different environment 
(eg. a client-server architecture) is much more difficult 
than for high-level language systems. The FermaT 



transformation system uses formal proven program 
transformations, which preserve or refine the seman- 
tics of a program while changing its form. These 
transformations are applied to restructure and sim- 
plify the legacy systems and to extract higher-level 
representations. 

By using an appropriate sequence of transforma- 
tions, the extracted representation is guaranteed to 
be equivalent to the original code logic. The method 
is based on a formal wide spectrum language, called 
WSL, with accompanying formal method. Over the 
last sixteen years we have developed a large catalogue 
of proven transformations, together with mechanically 
verifiable applicability conditions. These have been 
applied to many software development, reverse engi- 
neering and maintenance problems. 



3 Theoretical Foundation 

The theoretical work on which FermaT is based 
originated not in software maintenance, but in re- 
search on the development of a language in which 
proofs of equivalence for program transformations 
could be achieved as easily as possible for a wide range 
of constructs. 

WSL is the "Wide Spectrum Language" used in 
our program transformation work, which includes low- 
level programming constructs and high-level abstract 
specifications within a single language. This has 
the advantage that one does not need to differenti- 
ate between programming and specification languages: 
the entire transformational development of a program 
from abstract specification to detailed implementation 
can be carried out in a single language. Conversely, 



the entire reverse-engineering proeess, from a translit- 
eration of the source program to a high-level specifi- 
cation, can also be carried out in the same language. 
During either of these processes, different parts of 
the program may be expressed at different levels of 
abstraction. So a wide-spectrum language forms an 
ideal tool for developing methods for formal program 
development and also for formal reverse engineering 
(for which we have coined the term inverse engineer- 
ing). 

A program transformation is an operation which 
modifies a program into a different form which has 
the same external behaviour (i.e. it is equivalent under 
a precisely defined denotational semantics). Since 
both programs and specifications are part of the same 
language, transformations can be used to demonstrate 
that a given program is a correct implementation of a 
given specification. 

A refinement is an operation which modifies a pro- 
gram to make its behaviour more defined and/or more 
deterministic. A typical implementation of a nonde- 
terministic specification will be a refinement rather 
than a strict equivalence. The opposite of refinement 
is abstraction: we say that a specification is an ab- 
straction of a program which implements it. See [5,6] 
and [1] for a description of refinement. 

The syntax and semantics of WSL are described in 
[8,9,12] so will not be discussed in detail here. Most 
of the constructs in WSL, for example if statements, 
while loops, procedures and functions, are common 
to many programming languages. However there are 
some features relating to the "specification level" of 
the language which are unusual. Expressions and 
conditions (formulae) in WSL are taken directly from 
first order logic; in fact, an infinitary first order logic 
is used (see [4] for details), which allows countably 
infinite disjunctions and conjimctions. This use of 
first order logic means that statements in WSL can 
include existential and universal quantification over 
infinite sets, and similar (non-executable) operations. 
Two list operators are also used in specifications: for 
a unary function / and list L = (ai, . . . , a„) the map 
operator * is defined: 

/*i=DP {f{ai),f{a2),---,f{an)) 

For a binary operator g and non-empty list L the 
reduce operator / is defined: 

9/L 0,1 if n = 1 

,g(ai,.g/(a2, . . . ,a„)) if £{L) > 1 

For example, if / is a function which returns integers, 
and L is a non-empty list of suitable arguments for /, 



then +// * L is the result of applying / to every 
element of L and adding up the results. We also 
use £{L) to denote the length of list L and L[i . .j] 
to denote the sublist {oi, . . . ,aj). 

Over the last sixteen years we have been developing 
the WSL language, in parallel with the development 
of a transformation theory and proof methods. Over 
this time the language has developed from a simple 
and tractable kernel language [9] to a complete and 
powerful programming language. At the "low-level" 
end of the language there exist automatic translators 
from IBM Assembler into WSL, and from a subset of 
WSL into C. At the "high- level" end it is possible to 
write abstract specifications, similar to Z and VDM. 

In [10,13] program transformations are used to de- 
rive a variety of efficient algorithms from abstract 
specifications. In [10,12,13] the same transformations 
are used in the reverse direction: using transforma- 
tions to derive a concise abstract representation of the 
specification for several challenging programs. 

In [11] we describe a case study using FermaT 
to migrate an assembler program to modular and 
maintainable C code, using purely automatic trans- 
formations with no manual intervention. As far as 
we know, none of the other researchers in program 
transformations (for example, [2,7]) have attempted to 
apply their methods to assembler code. The nearest 
research is Cristina Cifuentcs work on decompilation 
and binary translation [Cifuentes CSMROO]. 

In this paper we go even further in the reverse 
engineering process. Starting with the same assembler 
program from [11] we use formal transformations to 
abstract an equivalent high-level specification of the 
program. 

4 Example Transformations in FermaT 

In this section we describe a small number of the 
transformations implemented in FermaT which are 
used in the case study. If Si and S2 are any WSL 
statements and A is any countable set of formulae 
with no free variables, then we write A h Si < S2 
to denote that S2 is a refinement of Si whenever all 
the formulae on A are true. If A h Si < S2 and 
A h S2 < Si then we write A h Si ^ S2 and say 
that Si and S2 are equivalent. If S2 is generated from 
Si by a program transformation, then A h Si » S2, 
where A is the set of applicability conditions for the 
transformation. 



4.1 Expand Forwards 

If B is any condition and Si, S2 and S3 are any 
statements then: 

A h if B then Si else S2 «; S3 w 

if B then Si; S3 else S2; S3 fi 

4.2 Loops 

As well as the usual for and while loops, there 
is a notation for unbounded loops. Statements of the 
form do S od, where S is a statement, are "infinite" or 
"unbounded" loops which can only be terminated by 
the execution of a statement of the form exit(n) which 
causes the program to exit the n enclosing loops. We 
use exit as an abbreviation for exit(l). To simplify 
the language we disallow exits which leave a block or 
a loop other than an unbounded loop. We also insist 
that n be an integer, not a variable or expression — this 
ensures that we can always determine the target of the 
exit. 

Definition 4.1 Global Substitution 
If P(S,p) is a predicate on a statement S and position 
p within S, and S'{S,p) is a function which returns 
a statement for any given statement S and p, then 
the effect of replacing or appending to the statement 
at position p in S with S'(S,p) for every p such that 
P(S,p) holds is denoted: 

S[S'(S,p)/p| P(S,p)] 

If the statement at position p in S is an exit statement, 
then it is replaced by S'(S,p). Otherwise, S'(S,p) is 
appended in sequence after the statement at position 
P- 

Within a global substitution we use S{S,p) to de- 
note the depth of a component of a statement. This is 
the number of enclosing do ... od loops surrounding 
the component. We use t(S,p) to denote the ter- 
minal value of a statement. This is the number of 
enclosing loops around S which might be terminated 
by execution of the statement at position p in S. If 
the statement at position p in S does not terminate 
S then t{S,p) = —1. For example, any exit(n) has 
terminal value n. If S contains an exit(n) within m 
nested loops (where m ^ n) then the terminal value 
of S itself, denoted t(S, ()), will be at least n — m. A 
statement S with terminal value zero cannot terminate 
any enclosing loops, so the next thing to be executed 
after S will be the next statement in the sequence 
containing S (if there is one). Such a statement is 



called a proper sequence. If S is a proper sequence, 
then: 

A h do if B then exit fi; S od while -iB do S od 

In the following transformations, the global substi- 
tutions are all applied to the simple terminal state- 
ments of S. These are the statements which are 
neither a sequence, a conditional, or a do ... od loop 
and which will terminate S if they are executed. For 
example, in: 

if B then a; := 1; y 2 else exit fi 

the terminal statements are y := 2 and exit. If the 
statement is enclosed in a do ... od loop, only the 
exit will be a terminal statement. 

We usually omit the parameters from 5 and t in a 
global substitution when these are obvious from the 
context. 

Definition 4.2 Incrementation 

The incrementation of S by n (where n is any non- 
negative integer) is defined as the incrementation of 
all simple terminal statements in S. An exit is in- 
cremented by incrementing its parameter, while any 
other simple statement is incremented by appending 
an exit: 

S + n =j,p S[exit(n-FJ)/p| T > 0] 
For example: 

if B then a; := 1; y :=2 else exit f i -f 2 

= if B then x ■.= 1; y := 2; exit(2) else exit(3) fi 

while: 

do if B then x := 1; y := 2 else exit fi od -|- 2 
= do if B then x := 1; y :=2 else exit(3) fi od 

Definition 4.3 Partial Increnientation 
The notation S -t- {n,m) where m ^ Q denotes in- 
crementation of the terminal statements in S with 
terminal value m or greater: 

S + {n,m) S[ex\t{n + 5) / p\ T ^ m] 

Note that do S od -|- (n, m) = do S + {n,m + 1) od. 



4.3 Absorption 

For any statements Si and S2: 

AhSi; S2 « Si[S2 + S/p\t^0] 
For example: 

do if B then x :— 1; y := 2 else exit fi od; z := 1 
do if B then x :— 1; y :— 2 else z :— 1; exit fi od 

This transformation can be applied in reverse to "take 
out" code from a loop. 

4.4 False Loop 

We can insert a loop around any statement, by 
incrementing it first: 

A h S « do S + f od 

(This is a "false loop" because the body of the loop 
can only be executed once). 

4.5 Loop Doubling 

Any loop can be converted to a double loop by the 
last transformation, or by incrementing the body of 
the loop: 

A h do S od w do do S od + 1 od 
« do do S + 1 od od 

More generally, we can arbitrarily decide whether or 
not to increment each terminal statement in S with 
terminal value zero: 

A h do S od w 
do do S [ex\t{S + 

|r>OVr = OA *(S,p)] od od 

Where 5" is any condition on S and p. 

This can be combined with the inverse of absorption 
to "isolate" part of a loop body. For example: 

A h do S; if B then Si else S2 fi od 
« do do S + (1,1); 

if B then exit else S2 + (1, 1) fi od; 
Si od 

4.6 Loop Inversion 

If Si is a proper sequence then: 

A h do Si; S2 od « Si; do S2; Si od 
More generally, for any statements Si and S2: 
A h do Si; S2 od « do Si; do S2; Si od + 1 od 



4.7 Loop Unrolling 

We can unroll the first step of a loop: 

AhdoSod « S[do S od + (5 + l/p| T = 0] 
[exit(r + ^- l)/p| r ^ 1] 

where the RHS contains two successive global substi- 
tutions on S. 

More generally, we can insert a copy of the whole 
loop, with certain terminal statements of the loop 
body incremented, after certain terminal statements 
in the loop body. Let S' be formed from S by incre- 
menting selected terminal statements with terminal 
value zero: 

S' = S[exit((S-M)/p| r = A $(S,p)] 

where $ is any condition (see Section [4.5| ) . Then: 

A h do S od 

« do S [do S' od + (5+l/p| r = A *(S,p)] 
[exit(r + 5 - 1) /p I r ^ 1] od 

where ^' is any condition. 

5 Modelling Assembler in WSL 

Constructing a useful scientific model necessarily 
involves throwing away some information: in other 
words, to be useful a model must be inaccurate, or 
at least idealised, to a certain extent. For example 
"ideal gases", "incompressible fluids" and "billiard 
ball molecules" are all useful models which gain their 
utility by abstracting away some details of the real 
world. In the case of modelling a programming lan- 
guage, such as Assembler, it is theoretically possible to 
have a perfect model of the language which correctly 
captures the behaviour of all assembler programs. 
Certain features of Assembler, such as branching to 
register addresses, self-modifying code and so on, 
would imply that such a model would have to record 
the entire state of the machine, including all regis- 
ters, memory, disk space, and external devices, and 
"interpret" this state as each instruction is executed. 
(Consider the effect of loading some data from a disk 
file into memory, performing arithmetic at arbitrary 
places in the data, and then branching to the start of 
the data block!) Unfortunately, such a model is useless 
for reverse engineering or migration purposes. 

What we need is a practical model for assembler 
programs which is suitable for reverse engineering, and 
is accurate enough to deal with all the programming 
constructs which are likely to be encountered. 



5.1 Assembler to WSL Translation 

The aim of the assembler to WSL translator is to 
generate WSL code which models as accurately as pos- 
sible the behaviour of the original assembler module, 
without worrying too much about the size, efficiency 
or complexity of the resulting code. Typically, the raw 
WSL translation of an assembler module will be three 
to five times bigger than the source file and have a 
very high McCabe cyclomatic complexity (typically 
in the hundreds, often in the thousands). This is, 
in part, because every "branch to register" instruc- 
tion branches to the dispatch routine, which in turn 
contains branches to every possible return point. In 
addition, every instruction which sets the "condition 
code" flags will is translated into WSL code which 
assigns an appropriate value to a special variable cc (to 
emulate the condition code): whether or not the con- 
dition code is subsequently tested. See [11] for further 
details of the assembler to WSL translation process 
and the various features of commercial assembler code 
which it has to deal with. 

However, the FermaT transformation engine in- 
cludes some very powerful transformations for sim- 
plifying WSL code, removing redundancies, tracking 
dispatch codes, and so on. In most cases FermaT 
can automatically unscramble the tangle of "branch 
and save" and "branch to register" code to extract 
self-contained, single-entry single-exit procedures and 
so eliminate the dispatch procedure. In addition, 
FermaT can nearly always eliminate the cc variable 
by constructing appropriate conditional statements. 

6 The Sample Program 

Our sample program is from "A Guided Tour of 
Program Design Methodologies", by G. D. Bergland 
[3] who in turn took it from a story called "Getting it 
Wrong" that has been related by Michael Jackson on 
numerous occasions: 

proc Management_Report = 
var SWl := 0, SW2 := : 
Produce_Heading; 
read (stuff); 

while NOT eof (stuff) do 
if First_Record_ln_Group 
then if SWl = 1 

then Process_End_Of_Previous_Group 

fi; 

SWl := 1; 

P rocess_Sta rt_Of _N ew_G ro u p ; 



Process_Record; 
SW2 := 1 

else 

Process.Record; SW2 := 1 

fi; 

read (stuff) 
od; 

if SW2 = 1 then Process_End_Of_Last_Group 

fi; 

Produce_Summary 
end. 

The program is a simple report generator which reads 
a sorted transaction file: each transaction contains 
the name of an item and the amount received or dis- 
tributed from the warehouse. The program generates 
a report showing the net change in inventory for each 
item in the transaction file. 

Our resident assembler guru was given the above 
pseudocode and asked to write an assembler imple- 
mentation which uses as many "features" of assembler 
as possible. The result is given in Section ID (I 
should like to point out on his behalf that this is 
not his normal coding style!) The program includes 
self-modifying code: the "first time through switch" 
SWl is implemented by modifying the branch labelled 
LAAA to a NOP in the instruction labelled LAB, and an 
EXecute statement has been used to get a variable 
length move. 

7 Automatic Program Transformation 

The first stage in the transformation process is Data 
Translation. This transformation uses the restruc- 
tured data file to change the data representation in the 
program. Initially all data is accessed directly from 
memory (represented as the byte array a) by adding 
the base register to the displacement to get an address. 
The restructured data file gives the layout of all data in 
memory, so by making some reasonable assumptions 
about non-overlapping DSECTS etc., FermaT is able 
to transform the program into an equivalent program 
where the data is accessed directly through variables 
and structures. For example, consider the "raw WSL" 
statement: 

!P mvc(a[db(writem, r3), 3 + 1] 

var a[db(wlast, r3),3 + 1]); 

Here, the !P indicates an external procedure call to 
the mvc procedure which implements the MVC (move 
characters) instruction. This moves the given num- 
ber of characters from the given source address to 



the given destination address. The function db{x,y) 
simply returns x + y, the displacement plus the base 
register, so the source address is writem + r3 and the 
destination address is wlast + r3. After data transla- 
tion, the same names are used as the actual variables 
and the base registers are eliminated. 

This statement is automatically transformed into 
the simple assignment: 

wlast := wrec.writem; 

In the case of our simple program, there is only one 
structure to uncover: the wrec print record which 
contains fields writem, wrtype and wrqty plus some 
unnamed fillers. 

The next stage is control flow restructuring: elimi- 
nating non-essential labels and branches, introducing 
loops. This is carried out in a series of passes through 
the program, at each iteration the program is searched 
for points where a simplifying transformation (such as 
loop insertion or branch merging) can be applied. The 
iteration is continued until no further improvement 
can be achieved. 

The raw WSL is written as an action system, a 
collection of parameterless procedures (actions) where 
execution of any actuin will always lead to either 
calling another action, or calling the special action Z 
which terminates the whole action system. An action 
system itself is a simple statement, so action systems 
can be nested inside each other, but a sub- action 
system cannot call actions in the main system. 

The system then analyses the remaining actions 
to determine which actions may form the body of a 
simple procedure. To do this it uses both control 
flow and data flow analysis. If it determines that 
a collection of actions form a procedure, then these 
actions are extracted out as a sub-action system in 
the body of the procedure. 

After control flow restructuring we have data flow 
analysis: in particular an extended form of constant 
propagation which can propagate return addresses 
through procedure calls. If a dispatch call is encoun- 
tered with a known destination value, then it can be 
unfolded and simplified. The same transformation 
also deals with conditional assignments to the con- 
dition code (cc) in order to remove references to cc 
where possible. 

FermaT was able to extract a collection of actions 
to form the endgroup procedure, so that the code: 

rlO := 112; call endgroup 

becomes: 

rlO := 112; endgroup(); call dispatch 



FermaT determines that the value in rlO will be copied 
into destination by the body of endgroup. Within 
dispatch the value in destination is compared against 
the offsets of all the possible return points. Offset 112 
is associated with the label lab, so this call dispatch 
can be replaced by call lab. 

The control flow and data flow restructuring trans- 
formations are iterated until no further improvement 
is possible. Figure |^ lists the metrics for the raw 
WSL translation and after automatic restructuring 
and simplifying transformations have been applied. 
This order of magnitude improvement in most of the 
metrics is typical for all sizes of assembler module. See 
[11] for more details of this part of the transformation 
process. 



Metric Raw 


WSL 


Structured WSL 


Statements 


561 


106 


Expressions 


1,589 


210 


McCabe 


184 


17 


Control/Data Flow 


520 


156 


Branch-Loop 


145 


17 


Structural 


6,685 


751 



Figure 1: Metrics Before and After Transformation 



begin 

f_laaa := 1; 

!P open(ddin_ddname, input var os); 
IP open(rdsout_ddname, output var os); 
wprt[l . . 17] := "MANAGEMENT REPORT"; 
writel(); writel(); 

wprt[1..20] := "ITEM NET CHANGE"; 
writel(); writel(); 
xswl := 0; 

do rO := 0; rl := 0; rl5 := 0; 

!P get(ddin_ddname var os, rO, rl, rl5, wrec); 
if IXC end_of_file(ddin_ddname) 

then exit(l) fi; 
if wrec.writem ^ wlast 
then if f_laaa / 1 

then endgroupO fi; 
f_laaa := 0; 
wlast := wrec.writem; 
wnet := IXF zap("hex OxOC") fi; 
worka := IXF pack(wrec.wrqty, 2); 
if wrec. wrtype ^ "R" 
then wnet := IXF sp(wnet, worka) 
else wnet := IXF ap(wnet, worka) fi; 
xswl := "hex OxFF" od; 



if xswl = "hex OxFF" then endgroup() fi; 
wprt[1..17] := "NUMBER CHANGED = "; 
!P ed(wchange[l . . 10] var workb); 
r4 := !XF add ress.of (workb); rl := 9; 
do if a[r4, 1] ^ " " then exit(l) fi; 

r4 := r4 + 1; 

rl := rl - 1; 

if rl = then exit(l) fi od; 
a[!XF address.of (wprt) + 17, rl + 1] 

:= a[r4, rl + 1]; 
writel(); 

!P close(ddin_ddname var os); 
!P close(rdsout_ddname var os) 
where 

proc endgroupO = 

wprt[l . .4] := wlast; 
wsign := "+"; 

if wnet < "hex OxOC" then wsign := "-" fi; 

wprt[8..17] := "hex 0x40206B2020206B202120"; 
!P edmk(wnet[l . . 10] var wprt[8 . . 17], rl); 
rl := rl - 1; 
a[rl, 1] := wsign; 
writel(); writel(); 

wchange := !XF ap(wchange, "hex OxlC") end, 
proc writel() = 

!P put(rdsout_ddname, wprt var os); 

wprt := wspaces end 
end 



8 Abstracting a Specification 

This is about as far as the FermaT system can 
get by purely automatic transformation applications 
with no human intervention. The next step in the 
abstraction process is to change the data representa- 
tion so that files become lists. We unfold the writel 
procedure and replace zap, ap and sp calls by their 
actual operations. We abstract away from the layout 
of the output file by creating a list of the data elements 
that appear on each line of output and appending this 
list to the output array: 

begin 

i := 0; f_laaa := 1; 

output := (("MANAGEMENT REPORT"), 

("ITEM NET CHANGE")); 

xswl := 0; 

do i := i + 1; wrec := input[i]; 
if i > n then exit(l) fi; 
if wrec.writem ^ wlast 
then if f_laaa ^ 1 

then endgroupO fi; 



fJaaa := 0; 
wlast := wrec.writem; 
wnet := fi; 
if wrec.wrtype "R" 
then wnet := wnet — wrec.wrqty 
else wnet := wnet + wrec.wrqty fi; 
xswl := "hex OxFF" od; 
if xswl = "hex OxFF" then endgroup() fi; 
output :=output -H- 

(("NUMBER CHANGED = ", wchange)); 

where 

proc endgroupO = 

output := output -ff ((wlast, wnet)); 
wchange := wchange + 1 end 
end 

We can get rid of the switches xswl and f_laaa by 
unrolling the first step of the do ... od loop and 
simplifying. We then use loop inversion to move some 
statements to the top of the loop: 

i := i + 1; wrec := input[i]; 
if i > n 
then skip 

else wlast := wrec.writem; 
wnet := 0; 

do if wrec.wrtype 7^ "R" 

then wnet :— wnet — wrec.wrqty 
else wnet := wnet + wrec.wrqty fi; 
i := i + 1: wrec := input[z]; 
if wrec.writem 7^ wlast W i ^ n 
then endgroupO; 
if i ^ n 
then exit(l) 

else wlast := wrec.writem; 
wnet := fi fi od fi; 

We want to roll the two statements LAST := 
wrec.writem; wnet := into the top of the loop, so 
convert the loop to a double- nested loop {loop dou- 
bling) and take the statements out of the inner loop 
{take out of loop). Then apply loop inversion. We can 
then take the statements starting with endgroup() out 
of the inner loop also: 

i := i + 1; wrec := input[i]; 
if i ^ n 
then skip 

else do wlast := wrec.writem; 
wnet := 0; 

do if wrec.wrtype "R" 

then wnet ;= wnet — wrec.wrqty 
else wnet := wnet + wrec.wrqty fi; 
i := i + wrec := input[i]; 



if wrec.writem ^ wlast \l i'^ n 
then exit(l) fi od; 
endgroupO; 
if i > nexit(l) fi od fi; 

Finally, the outer if statement can be removed by 
converting the outer loop to a while loop (this is the 
floop to while transformation): 

i —i + l; wrec := input[i]; 
while « < n do 

wlast := wrec.writem; 

wnet := 0; 

do if wrec.wrtype 7^ "R" 

then wnet := wnet — wrec.wrqty 
else wnet := wnet + wrec.wrqty fi; 
i := i + 1; wrec := input[i]; 
if wrec.writem 7^ wlast W i ^ n 
then exit(l) fi od; 
endgroupO od; 

Note that, after the initialisation code, the invari- 
ant wrec = input[i] is always true, and for i > 1, 
wlast = input[z — Ij.writem is also true, as is the 
invariant wchange = ^(output) — 2. So we can remove 
these three variables from the program. 

The program now consists of two simple nested 
loops, the outer while loop iterates over the groups 
of records and ends with a call to endgroup(), while 
the inner do ... od loop iterates over the records in 
the group. 

This suggests that we restructure the data to more 
closely match the control structure of the program by 
converting the input array to a list of lists where each 
sublist consists of a single group of data elements, 
so that the outer loop processes sublists one at a 
time and the inner loop processes elements of each 
sublist. The key to the data restructuring is to split 
the input sequence into sections such that the outer 
loop processes one segment per iteration. This is easily 
achieved with a function split(p, B) which splits p into 
non-empty sections with the section breaks occurring 
between those pairs of elements of p where B is false. 
(See [12] for a formal definition of split). In our case, 
the terminating condition on the inner loop provides 
the predicate on which to split: 

funct sameJtem(a;, y) = 
x.writem = y.writem. 

Then the new variable q is introduced with the as- 
signment: q := split(input, same_item). We index the 
q list with two variables ki and k2 so that g[fci][A;2] = 
input[i]. To do this we preserve the invariant: 

i = +/{e*q[l..ki-l])+k 



which, together with the invariant input = 4f/(j gives 
the required relationship. Adding these ghost vari- 
ables to the program we get: 

q :— split(input, sameJtem); 
i ■= 1; fci := 1; fc2 := 1; 
while i < f(input) do 
wnet := 0; 

do if input[i].wrtype 7^ "R" 

then wnet := wnet — input[z].wrqty 
else wnet := wnet + input[i].wrqty fi; 
i:=i + l; 
k'2 := fc2 + 1; 

if k2 > ^{q[ki]) then ki := fci + 1; k2 := 1 fi; 
if input[i].writem ^ input[2 — IJ.writem 

V z > £(input) then exit(l) fi od; 
endgroupO od; 

We can now replace references to the concrete vari- 
ables input and i by references to the new variables q, 
ki and fc2. The key point is that i < ^(input) if and 
only if ki < £{q) and 

input[i].writem input[i — IJ.writem 

is true when we have just moved into a new section 
of the input: in other words, precisely when k2 = 
1. So we can remove the concrete variables from the 
program: 

q := split(input, sameJtem); 
ki := 1; k2 := 1; 
while fci < £{q) do 
wnet :— 0; 

do if g[A;i][fc2]-wrtype "R" 

then wnet := wnet — q'[fci][/c2].wrqty 
else wnet := wnet -j- g[A;i][fc2]-wrqty fi; 

k2 ■= k2 + 1; 

if k2 > i{q[ki]) then ki := ki + 1; k2 := 1 fi; 
if fc2 = 1 then exit(l) fi od; 
endgroupO od; 

Now the inner loop reduces to a simple for loop: 

q := split(input, sameJtem); 

ki := 1; 

while fci < £{q) do 
wnet := 0; 

for k2 := 1 to £{q[ki]) step 1 do 
if q'[fci][fc2].wrtype ^ "R" 
then wnet := wnet — (7[fci][fc2]-wrqty 
else wnet := wnet -|- q'[fci][fc2]-wrqty fi; 
fci := fci + 1; 
endgroupO od; 



We can express the change to wnet as a function of 
the structure: 

funct change(s) = 

if s.wrtype ^ "R" then — s.wrqty else s.wrqty fi. 

It is clear that the inner loop is computing the sum 
of the change outputs for all the structures in the sub 
list (^[fci], so we can collapse the inner loop to a reduce 
of a map operation: 

q := split(input, samejtem); 

ki := 1; 

while ki < £{q) do 

wnet := +/change * q[ki]; 
ki := fci + 1; 
endgroupO od; 

The endgroup procedure simply appends an element 
to the output list: 

q :— split(input, sameJtem); 

ki := 1; 

while fci < £{q) do 

wnet := +/change * q[ki]; 

output := output -ff (((7[A;i][l], wnet)); 

fci :— fci + 1; 

endgroupO 

so we can collapse the outer loop to a map operation. 
See Section ^ for the final specification. 

This extracted specification looks very different 
to the original assembler (see Section ^TJ) but both 
programs are semantically equivalent and generate 
identical output files (when the output from the spec- 
ification is formatted to match the assembler). 

9 Conclusion 

This paper describes a particularly challenging re- 
verse engineering task: using formal program transfor- 
mations to extract a high-level abstract specification 
from an IBM 370 assembler program. The original 
assembler program contains several "layers" of com- 
plexity including self-modifying code, a flag used to 
direct control flow, a convoluted control flow struc- 
ture and so on. Fortunately the powerful automatic 
transformations implemented in FermaT allow us to 
remove the first few layers of complexity before we 
even have to look at the program. Moving to higher 
levels of abstraction requires a certain amount of hu- 
man intervention: particularly to select appropriate 
abstract data structures. However, this intervention 
requires only localised analysis of the program. The 



higher-level control flow transformations such as loop 
unrolling, loop rolling, taking code out of loops etc., 
arc all implemented in the FermaT system and any 
global analysis required by these transformations is 
handled automatically. 
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11 The Assembler Source 



* TST004A0 SAMPLE PROGRAM (MCDONALDS) * 
***************************************** 
* 

REGEQU 

* 

* PRINT NOGEN 
TST004A0 CSECT 

STM R14,R12,12(R13) 
LR R3,R15 
USING TST004A0,R3 
ST R13,WSAVE+4 
LA R14,WSAVE 
ST R14,8(R13) 
LA R13,WSAVE 



OPEN (DDIN, (INPUT)) 

OPEN (RDSOUT , (OUTPUT) ) 

MVC WPRT(17) ,=CL17>MANAGEMENT REPORT' 

BAL RlO.WRITEl 

BAL RlO.WRITEl 

MVC VIPRT(20),=CL20'ITEM NET CHANGE' 

BAL RlO.WRITEl 

BAL RlO.WRITEl 

MVI XSWl.O 

EQU * 

GET DDIN.WREC 

CLC WRITEM.WLAST 

BE LAC 

B LAB 

BAL RIO.ENDGROUP 

MVI LAAA+l.O 

MVC WLAST.WRITEM 

ZAP WNET,=P'0' 

BAL RIO.PROCGRP 

MVI XSWl.X'FF' 

B LAA 

BAL RIO.PROCGRP 

MVI XSWl.X'FF' 

B LAA 

CLI XSWl.X'FF' 

BNE LADA 

BAL RIO.ENDGROUP 

Equ * 

MVC WPRT( 17). =CL17' NUMBER CHANGED = ' 

ED WORKB.WCHANGE 

LA R4.W0RKB 

LA R1.9 

CLI 0(R4).C' ' 

BNE LADC 

LA R4 , 1 (R4) 

BCT Rl.LADB 

EX Rl.WMVCl 

MVC WPRT+17(1).0(R4) 

BAL RlO.WRITEl 



LAAA 
LAB 



LADC 
*WMVC1 



CLOSE DDIN 
CLOSE RDSOUT 



PROCGRP 



LBA 
LBB 



L 

LM 

SLR 

BR 

Equ 

ST 

PACK 

CLI 

BNE 

AP 

B 

SP 

L 



R13.WSAVE+4 

R14.R12.12(R13) 

R15.R15 

R14 



RIG.WSTIGA 
WORKA.WRQTY 
WRTYPE.C'R' 
LBA 

WNET.WORKA 
LBB 

WNET.WORKA 
RIO.WSTIOA 





BR 


RIO 


* 

ENDGROUP 


EQU 


* 




ST 


RIO.WSTIOA 




MVC 


WPRT (4) , WLAST 




MVI 


WSIGN,C'+' 




CP 


WKET,=P'0' 




BNL 


LCA 




MVI 


WSIGN.C'-' 


LCA 


EQU 


* 




MVC 


WPRT+7 ( 10) , =X ' 40206B20202C 




EDMK 


WPRT+7(10).WNET 




BCTR 


Rl.O 




MVC 


0(1. Rl) .WSIGN 




BAL 


RlO.WRITEl 




BAL 


RlO.WRITEl 




AP 


WCHANGE.=P'l' 




L 


RIO.WSTIOA 




BR 


RIO 


* 

WRITEl 


EQU 


* 




PUT 


RDSOUT. WPRT 




MVC 


WPRT. WSPACES 




BR 


RIO 


* 

WMVCl 


MVC 


WPRT+17(1) .0(R4) 


* 

WSAVE 


DC 


ISF'O' 


WSTIOA 


DS 


F 


WREC 


DS 


0CL80 


WRITEM 


DS 


CL4 




DS 


CLI 


WRTYPE 


DS 


CLI 




DS 


CLI 


WRQTY 


DS 


CL3 




DS 


CL70 


WPRT 


DC 


CL80' ' 


WSPACES 


DC 


CL80' ' 


WLAST 


DC 


CL4 '****' 


WCHANGE 


DC 


PL4'0' 


WNET 


DC 


PL4'0' 


WDRKA 


DC 


PL2'0' 


WDRKB 


DC 


XLIO ■ 40206B2020206B202120 ' 


¥SIGN 


DC 


CLI ' ' 


XSWl 


DC 


X'OC 


* 


LTORG 




* 

DDIN 


DCB 


DDNAME=DDIN. 






DSORG=PS. 






EODAD=LAD. 






MACRF=GM 


RDSOUT 


DCB 


DDNAME=RDSOUT. 






DSORG=PS, 






MACRF=PM 


* 


END 





12 The WSL Specification 
begin 

q := split(input, same-item); 
output :=header 4f process * q 

-H- (("NUMBER CHANGED = '\i{q))) 

where 

funct sameJtem(.x, y) = x.writem = j/.writem. 
funct process(L) = +/change * L). 

funct change(s) = 

if s.wrtype ^ "R" then — s.wrqty else s.wrqty fi. 
end 



